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Abstract 

Coarse structural nested mean models provide a useful tool to estimate treatment 
effects from longitudinal observational data with time-dependent confounders. However 
there is no existing guidance to specify the treatment effect model, and model misspe- 
cification can lead to biased estimators, preventing valid inference. To test whether the 
treatment effect model matches the data well, we derive a goodness-of-fit test procedure 
based on overidentification restrictions tests. We show that our test statistic is doubly- 
robust in the sense that with a correct treatment effect model, the test statistic has the 
correct level if either the treatment initiation model or a nuisance regression outcome 
model is correctly specified. In a simulation study we show the test procedure has 
correct type-I error and is powerful to detect model misspecification. In addition, we 
apply the test procedure to study how the timing of combination antiretroviral treat¬ 
ment initiation after infection predicts the one year treatment effect in HIV-positive 
patients with acute and early infection. 

Keywords and phrases: Causal inference; Censoring; Doubly robust; Estimating 
Equation; Coodness-of-fit test; HIV/AIDs; Longitudinal observational study; Overiden- 
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tification restrictions test. 


1 Introduction 

The gold standard for evaluating effects of interventions are randomized controlled trials. 
However, they are not always available; for example, for evaluating a treatment strategy for 
HIV-positive patients, a randomized controlled trial would force patients to take the treat¬ 
ment or to be off the treatment regardless of their health status. Observational studies are 
useful in these settings. In observational studies, there often is time-dependent confounding 
by indication; some covariates are predictors of both the subsequent treatment and outcome, 
and are also affected by the past treatment history. Then, standard methods adjusting for 
the covariates history are fallible and can lead to bias (Robins et ah, 1992; Robins, 2000; 
Robins et ah 2000). 

Coarse structural nested mean models (Robins, 1998a) provide a useful tool to estim¬ 
ate treatment effects from longitudinal observational data. Lok and DeGruttola (2012) de¬ 
veloped a time-dependent version of coarse structural nested mean models and applied it to 
investigate the impact of the timing of combination antiretroviral treatment initiation on the 
effect of one year treatment in HIV-positive patients. Their semiparametric method leads 
to an infinite number of unbiased estimating equations and a huge class of consistent and 
asymptotically normal estimators. An optimal estimator can be derived within this class of 
coarse structural nested mean models under well-specified models for the treatment effect, 
treatment initiation, and a nuisance regression outcome model, in an unpublished 2014 tech¬ 
nical report available from the second author. The key assumption lies in a well-specified 
model for the treatment effect. However, no guidance exists on how to specify the treatment 
effect model, and model misspecification may lead to biased estimators, preventing valid 
inference. 

The main contribution of this article is to derive a goodness-of-fit test statistic for testing 
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correct specification of the treatment effect model. The key insight is that with a correctly- 
specified treatment effect model we have more unbiased estimating equations than the num¬ 
ber of parameters, which results in overidentification of the parameters. Overidentification 
restrictions tests, also called Sargan tests or J-tests (Sargan, 1958 and Hansen, 1982), are 
widely used in the econometric literature; they however seem to have been previously un¬ 
noticed in the biostatistics literature. The standard overidentification restrictions test, given 
by the minimized value of the generalized method of moments (Newey and McFadden, 1994; 
Imbens et ah, 1995) criterion function, has a chi-squared limiting distribution, with degrees 
of freedom equal to the number of overidentification restrictions. In most situations, the 
minimum of the generalized method of moments criterion is obtained by a continuous iter¬ 
ative procedure to update the parameter estimates until convergence (Hansen et ah, 1996). 
Arellano and Bond (1991) showed the test statistic based on one-step estimates other than 
the optimal generalized method of moments estimates is not robust and tends to over-reject 
even in large samples. Our test procedure is different from the standard over identification 
restrictions tests in this regard. We do not obtain parameter estimates by minimizing an 
objective function, but rather we obtain parameter estimates by solving the optimal estimat¬ 
ing equations with the number of equations equal to the number of parameters. The overly 
identified restrictions are only used for testing, not for estimation. This difference allows 
us to greatly reduce the computation burden. Our simulation studies show that our test 
statistic has correct size for large samples under the scenarios we considered. Another merit 
of the overidentification restrictions test is that no bootstrap is needed to compute the test 
statistic, which could be valuable with the large samples that are increasingly common. 
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2 Motivating problem and basic setup 


2.1 The motivating problem 

Combination antiretroviral treatment is the standard initial treatment for HIV, and has 
considerably reduced the morbidity and mortality in HIV-positive patients. In the HIV 
literature, hndings imply that there are key early events, during acute and early infection, in 
the pathogenesis of HIV infection that determine the long-term pace of disease progression 
(Hecht et al., 2006). However, there is no strong evidence to support when to start treatment 
in patients in the acute and early stages of infection. It is important to understand the 
effect of initiating treatment at different times during the course of HIV infection. This 
investigation relies on an observational study, where we emulate a counterfactual experiment 
using causal models. 

2.2 The Acute Infection and Early Disease Research Program 

The Acute Infection and Early Disease Research Program study is a multicenter, observa¬ 
tional cohort study of 1762 HIV-positive patients diagnosed during acute and early infection 
(Hecht et al., 2006). Dates of infection were estimated based on a stepwise algorithm that 
uses clinical and laboratory data (Smith et ah, 2006). We included patients with CD4 and 
viral load measured within 12 months of the estimated date of infection, which resulted in 
1696 patients. Let m denote the number of months between the estimated date of infection 
and combination antiretroviral treatment initiation (m = 0,..., 11), where 0 indicates the 
estimated date of infection. We are interested in evaluating the impact of m on the effect of 
one year treatment. 

2.3 Notation 

Let Yfc be the patient’s CD4 count at month k since the estimated date of infection {k = 
0,..., iP -|- 1 = 24), and be a vector of covariates measured at month m, including age. 
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gender, race, injection drug use, CD4 count and viral load. Let be one if the patient was 
on treatment at month m and zero otherwise. We assume that once treatment is started, 
it is never discontinued. We use overbars to denote a variable history; for example. Am is 
the treatment history until month m. Let T be the actual treatment initiation time. The 
patients are assumed to be an independent sample from a larger population (Rubin, 1978), 
and for notational simplicity we drop the subscript i for patients. To handle missingness, 
for L, if this is missing at month m, Lm was coded as “missing”. For intermediate missing 
outcomes, we imputed W by interpolation; if the outcome is missing just prior to onset of 
treatment, we imputed W by carrying the last observation forward. Let X = {Ak-, Lk, bir+i) 
denote the patient’s full record. Until Section 6 we assume all patients are followed up until 
month K + 1. 

Let be the CD4 count at month k, possibly counter factual, had the patient started 
treatment at month m. Let be the CD4 count at month k had the patient never started 
treatment during the course of follow up. We assume the patient’s observed outcome W is 
equal to the potential outcome for m equal to the actual treatment initiation time T ; 
that is, li k > T, Yk = Y^f^ and if T > /c, W = Ylf°\ 

We assume the assumption of no unmeasured confounding (Robins et ah, 1992); 

y4°o) jj I (fc = m + 1,..., m + 12), (1) 

where 11 denotes “is independent of” (Dawid, 1979). This assumption holds if Lm contains 
all prognostic factors for that affect the treatment decision at month m. For example, 
if patients with lower CD4 counts initiated treatment earlier, the assumption (1) would fail 
to hold if Lm does not include the history of the CD4 count. 
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2.4 Coarse structural nested mean model 


We model the treatment effect, comparing treatment starting at month m to never starting 
among the subgroup of patients with covariate history and T = m, as 

I ^ T = m}^ itjjra) (k = m,... ,m + ll), (2) 

where is the parameter in the treatment effect model. From now on, we consider 7 ^ ^(^m) = 
("01 + 'ip 2 fT^){k — m)l(m<fc), with {k — m) the duration of treatment from month m to month 
k. We restrict the range of k from 12 to iF + 1, whereby we avoid making extra modeling 
assumptions beyond the necessary ones to estimate in order to gain robustness. 

Particularly, 7 ™ quantifies the effect of one year treatment if HAART was initiated at 

month m, among the subgroup of patients with covariate history If outcome is the CD4 
count and > 0, the effect of one year treatment is beneficial. 12'^i quantifies the 

effect of one year treatment if treatment was started at the estimated date of infection, and '02 
quantifies the increase of the effect of treatment for each month of delay after the estimated 
date of infection. Under this model, the treatment effect is homogeneous. In practice, the 
treatment effect may vary among different groups; for example, male and female patients 
may have different responses to the combination antiretroviral treatment. We can then 
extend the model as 7 m,^(^m) = ( 0 i + 02 ur + 03 gender)(fc — where 03 quantifies 

the magnitude and direction of the impact of gender. 

For /c = 12,..., iP + 1, define H{k) = Yk — 7 y(LT)- As proved in Robins et al. (1992), 
Lok et al. (2004) and Lok and DeGruttola (2012), H{k) is mimicking a counterfactual out¬ 
come in the sense that for k = 12,..., K + 1 and m = k — 12,..., k — 1, 

E{H (k) I 0^5 Am-l = 0, Am} = E{Y^°°'^ \ = O 5 ^m}, (3) 

since by subtracting from the observed 10 the average effect of treatment, we would obtain 
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the quantity that mimics the outcome had the patient not been treated. The implication 
of (3) and the assumption of no unmeasured confounding is that for k = 12,..., K + 1 and 
m = k — 12,... ,k — 1, 

E{H[k) I Lm, = 0, Am} = E{H(k) \ L^, = 0}, (4) 

which plays a key role for estimation. 

2.5 The conditional probability of treatment initiation 

We use a pooled logistic regression to model the probability of treatment initiation at 
month m, conditional on the past history, pg{m) = P{Am = 1 | A^-i = = 

l(A^_i=o)lvisit(?^)/[l + exp{—0^/(Zm)}], where lvisit(?^) is an indicator of whether a visit 
took place at month m, and f{Lm) is some function of Lm- Let Jtrt(0)(W) denote the estim¬ 
ating function for 9q. 

2.6 The unbiased estimating equation and optimal estimation 

Model (2) cannot be fit by standard regression methods because it involves potential out¬ 
comes. However, one can get consistent estimates by constructing unbiased estimating 
equations based on (4) (Lok and DeGruttola, 2012): for any measurable, bounded function 
Qm : Pm —t fc = 12 ,..., iL -|- 1 , m = /c — 12 ,..., /c — 1, let 

K+l k-l 

qm{Lm)[H^{k)-E{H^{k)\Lm,Am-i = ^}]{Am-Ve{m)}. 

k=12 m=k—12 

We use empirical process notation throughout. We let P denote the probability measure 
induced by X and let denote the empirical measure induced by Xi,...,Xn. Given a 
measurable function / : A” M, we write P„/(X) = n~^ fi^i) Pf{X) for the 
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expectation under P. Then 


Pn{ Jtrt{e) (^) } ~ 0 (^) 

are the stacked unbiased estimating equations for both the parameter ^|J and the (nuisance) 
parameter 6. For simplicity, we will suppress the dependence of the estimating functions 
on X; for example, PnG{ip,e,q) is shorthand for PnG(^fi^q){X). Sometimes, we also drop the 
dependence on the parameters. 

In theory, q can be chosen arbitrarily; however it largely influences the precision of the 
resulting estimator. To derive the optimal estimating equation, and therefore the optimal 
estimator, we assume that for fc, s = 12,..., JF + 1 and m with m = max(fc — 12, s — 
12),..., min(fc — 1, s — 1, JF), 

COy{H (fc), H{s) I Zm, Ara-l = 0, Am} 11 Am \ Lm, Am-l- (6) 


This assumption is a natural extension of (4). This would be true under Robins (1998b)’s 
rank preservation assumption and (T)!°°\ Ri°°^) 11 Am | Lm, Am-i- However, the rank preser¬ 
vation assumption is unlikely to hold in practice and the assumption (6) is weaker. 

The optimal estimating equations, within the class of PnG^^^^q) indexed by q for any meas¬ 
urable and bounded functions can be obtained by finding q°'^^ that satisfies E{d/G 

P{G{■>pofio,q)Gj^^ for any q (Newey and McFadden, 1994). Then, under (6) 
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q^iLm) = {var(ifm | A^-i = 0)} 

^ 1-^ f I Pm, Am-l = 0, Am = 1 ) — E (-—Hm \ Lm, Am = 0 


dip 


dip 


(7) 


where qPL^ = {q^L^’P ..., q^L^'A with I = max(m -|- 1,12) and r = min(m -|- 12, JF -|- 1), 
which are informed by the fact that m-|-l<fc<m-|-12 and 12 < k < K + 1] 
Hm = • • •)-H^(^)}^; 'va.Ti^Hm I Lm,Am-i = 0) is a matrix with elements F^ = 


cov I Lm,Am-i = 0}; and E^d/d^ipH^ \ Lm,Am-i = 0,Am = l)-E{d/d'ipHm \ 
L^, = 0) = {Al, ..., with Ai = Eid/d'ipH^k) I L^, Am-i = 0, A^}-E{d/dijH4k) 

Lmy 1 0 }' 

Remark 1 For the optimal estimating equations, defined by (5) and (7), we address two 
issues. One issue is that E{H.,p{k) \ Lm,Am-i = 0} and depend on rf. Following 
Lok and DeGruttola (2012), we use a preliminary consistent estimate iljp to replace ipo in 
E{FI.^I,{k) I Lm, Am-i = 0} and The rationale for this replacement is that the estimating 
equations are unbiased for any fixed value of ifp when pefm) is correct. If is linear in 
if, does not depend on if. One choice ofifp is the optimal estimator if is only non¬ 
zero for k = m + 12. Another issue is that and E{H.,jj[k) \ Lm,Am_i = 0}, even with 
if in lieu of ifp, depend on the true unknown distribution. We will use parametric models 
to approximate these quantities. Let E{H.ip^{k) \ = 0} be parametrized by ^i; 

for example E^^{H.^p{k) | Lm,Am-i = 0} zs a linear regression model with covariates L^. 
Likewise, let A^ be parametrized by ^2- Denote estimating functions for ^2 and ifp by 
J 2 {^ 2 ) ^p(hp,6)- 5'mce Gp depends on A™+^^, it is also a function of (, 2 . 

3 Asymptotic results of optimal estimators 

We present the consistency and asymptotic normality result of the optimal estimator. These 
results are the building blocks to derive the goodness-of-fit test statistic. 

Theorem 1 (Consistency) LetG*^^ be optimal estimating functions 

K+l k-l 

“EE 'tX.Wrn)[H^,(k) - I = 0}]{>1„ -p«(m)}, 

k=12 m=k—12 

and J2(5,) JtrmV 

ating functions stacking all estimating functions together, where Gp, Ji and J 2 are defined 
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in Remark 1, and is defined in (5). Let be the solution to estimating equa¬ 

tions PnU{'ii;,ii}p,s_,e) = 0. The true parameter values are fio, and 6 q. Under the regularity 
conditions (C1)-(C2) specified in the Supplementary Material, if the treatment effect model 
is wcll Specified, and either E^^{H,jj{k)\Lm, Am-i = 0} or pe^m) is well specified, 
^ 0 in probability, as n ^ oo. 

Theorem 2 (Asymptotic normality) Under the regularity conditions (Cl)-(C5) specified 
in the Supplementary Material, rtfP{fi) — ipo) —)■ Np (0, in distribution, as n ^ oo, where 

p is the dimension offio and is thepxp upper left matrix in {Pd/d{fi,'ipp,f^,9)Uf~"^P{U 

Remark 2 In the statistics literature, estimators solving unbiased estimating equations are 
often called Z-estimators. The theory of consistency and asymptotic normality of Z-estimators 
is well established, see for example Theorem 5.9 and Section 5.3 in Van der Vaart (2000). 
We skip the detailed proof but explain the regularity conditions needed to guarantee the con¬ 
sistency in the Supplementary Material. From Theorem 1, the functional form of 7^ ^ 
must be correctly specified. In contrast, the estimator remains consistent for if either 
Ei. {H.,p{k)\Lm, Am-i = 0} or peijn) is well specified, but not necessary both. The estim¬ 
ator is doubly robust (Robins and Rotnitzky, 2001; Van der Laan and Robins, 2003). The 
functional form of the nuisance models can be selected on the basis of the observed data, as 
well as the literature and subject knowledge specific to the application setting. Later in this 
article, we provide a more specific illustration in the context of our example. 

4 Goodness-of-fit test 

The consistency, asymptotic normality, and donble robnstness of estimators rely on a key 
assnmption, that is, the treatment effect model is well specihed. Misspecihcation of the treat¬ 
ment effect model canses bias in the parameter estimation and break down the asymptotic 
resnlts. We now develop tests for model specihcation based on overidentification restrictions 
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tests. Conceptually, for a well specified model, a new set of unbiased estimating functions, 
other than the optimal ones that are used for estimation, evaluated at the optimal estimat¬ 
ors, should be asymptotically concentrated at zero. This asymptotic behavior leads to the 
following theorem: 


Theorem 3 (Goodness-of-Fit Test) Let the treatment effect model be ^{Im) and H^(k) 
dfc—7T’,p(^r)- Choose a set of functions {q^{Lm) € k = 12,..., K+l,m = /c —12,..., k — 
1} that are different from the optimal choice 

K+l k-l 

^ ^ ^ ^ ?m,52 (^) I Cmi -^m—1 0}]{^m Pe('^)}- 

k=12 m=k—12 


Let he as in Theorem 1. The null hypothesis is Hq: 7m(^m) well specified, and 

either E^,,^{H^{k) \ Lm,Am-i = 0} or poim) is well specified. Under Hq and the regular¬ 
ity conditions (Cl)-(CIO) specified in the Supplementary Material, the goodness-of-fit test 
statistic 


GOF = n{Pr,G 




YT.-^PnG 






( 8 ) 


in distribution, as n ^ oo, where S is the variance of *h(^o,7o,?o,0o) ^(po,bo,€o,0o) 

asymptotic linear representation of G^,p.,p which is a linear combination of G*, G, Gp, 
Ji, J 2 and Jtrt, defined in (6) in the Supplementary Material, and S is the sample variance 


We state the key steps in the proof, with details in the Supplementary Material. To 
establish the asymptotic distribution of ^ 0 ), and therefore that of GOF, a key 

step is to linearise g 0 ) as Pn^(ipo,^o,^o,&o) some <h, whereby we can apply 

the typical central limit theorem. To do so, the Lipschitz condition (C7) implies the functions 
form a Donsker class. Using Lemma 19.24 of Van der Vaart (2000), we have 


'/^i.Pn P)G{^o,ipo,^Q,eo) T np(l). 


(9) 


11 


Next, we apply a Taylor expansion to ^ 10 ) hand side of (9) and use the fact 

that = 0 on the right hand side of (9). Finally, we can express $(^ 0 ,^ 0 , 50 , 00 )) 

the asymptotic linear representation of <^(^^^ 50 ), to be a linear combination of G*, G, Gp, 
Ji, J 2 and Jtrt- 

Remark 3 (Double Robustness) The goodness-of-fit test statistic is doubly robust in the 
sense that for (8) to hold we only require that either E^^{H^[k)\Lmi ^m-i = 0 } or pg(m) is 
well specified, not necessary for both. This property adds a protection from possible misspe- 
cification of the nuisance models. 

Remark 4 The standard overidentification restrictions test is min,^ ?7,P„V^^j{S('0)}“^P„V(^), 
where V(^) = ( G*^^ asymptotic variance of 

In most situations, the minimum is obtained by a continuous iterative procedure to update 
the parameter estimates; that is, ifd+T = arg min^ ? 7 ,P„V’^^^{S('?/^^*^)}“^P„V(^) until conver¬ 
gence (Hansen et al, 1996). Our test procedure does not need any iterative procedure, which 
simplifies the calculation. 

Remark 5 (Choosing q) Just like a naive choice of q in estimating equations may lead 
to an estimator with large variance and thus useless inference, an arbitrary choice of q may 
lead to the goodness-of-fit test lacking of power. We propose the following procedure to choose 
q, which is powerful in certain circumstances. Suppose we have two models to choose from 
for the treatment effect. Let the null model be 7 ^, which is the treatment effect model we 
are testing for, and the other model to be an alternative model 7 ^. ITe can derive A*, q*°P^^ 
A, and as in (7) with 7 ^ and fiy. Note that is used for optimal estimation of the 
parameters in the null model. Then, candidates for q are A*, A, or any subvector of 
these that is not included in q*°^^. Our simulation study shows that the goodness-of-fit test 
with is most powerful among this set of candidates in detecting the alternative model. 
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5 Extension of goodness-of-fit test in the presence of cen¬ 


soring 

We use the Inverse-Probability-of-Censoring-Weighting technique (Robins et ah, 1995 ; Hernan et ah, 
2005 ; Lok and DeGruttola, 2012 ) to accommodate patients lost to follow-up. Let Cp = 0 
indicate a patient remains in the study at month p. Following Lok and DeGruttola ( 2012 ), 
we assume that censoring is missing at random; that is, {L,A) 11 Ck+i \ Lk,Ak.,Ck = 0 , 
whereby we have P{Ayn = 1 | = (5,Gm = 0) = P{Ara = 1 | Lm,Am-i = 0), and 

Pe{m) does not depend on censoring. Dehne the Inverse-Probability-of-Gensoring-Weightingg 
version of estimating functions and G'^ using weights p = l/{np=m+i~ ^ I 
Lp_i, Rp_i, Gp-i = 0)}, see the Supplementary Material for details. In calculation of the 
weights, we use a pooled logistic regression model to estimate Pp(Gp = 0 | Lp-i, Gp-i = 

0). We assume the censoring model is well specihed with estimating functions Jcen(r?)- Simil¬ 
arly, we have the Inverse-Probability-of-Gensoring-Weighting version of the estimating func¬ 
tion for the preliminary estimator ^jJp, denoted by G^. For the nuisance regression outcome 
models, the regression was restricted to patients still in follow-up and use weighted regression 
analysis with the censoring weights. 

Dehne the goodness-of-ht test statistic as 


GOF" = n{F„G 




}^(E^)-iF„G 




where is the sample variance of ^ ho So »?o) asymptotic linear 

representation of Supplementary Material. As proved 

in the Supplementary Material, subject to regularity conditions, GOF^ has an asymptotic 
chi-squared distribution, with degrees of freedom dimension of G^. 
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6 Simulations 


The simulation designs were based on the Acute Infection and Early Disease Research Pro¬ 
gram database, but we did not consider censoring. Following an unpublished 2014 tech¬ 
nical report available from the second author, we hrst generated the CD4 count outcomes 
under no treatment, followed by a treatment initiation time T, and lastly the ob¬ 
served outcomes {k = 6 ,..., 30), as follows: (i) In each sample, 2 groups were sim¬ 
ulated: injection drug users (10%) and patients who never used drugs (90%), and then 
log A ^(6 ■ 0 , 0 • 4^) for injection drug users, and A ^(6 ■ 6 , 0 • 5^) for non injection drug 

users. For k > 6, = —10-|-+efc+i, where ~ A^(0, a1) with (jfc = 52 • 375 — 1 ■ 625fc 

for fc = 7,..., 19 and = 21 • 5 for k = 20,..., 30; (ii) T was generated by a logistic regres¬ 
sion model logit{P(T = m | T > m, Tm)} = —2 • 4 — 0 • 42injdrug — 0 ■ 00351",!°°^ — 0 ■ 026m, 
where injdrug is an indication of being an injection drug user; and (iii) 

We considered different models for 7 ^. 

The performance of the test statistics was assessed by their ability (i) to conhrm the 
adequacy of a model that is correctly specihed with the data-generating model (type-I error) 
and (ii) to reject a misspecihed model (power). The model under the null hypothesis Hq 
upon which the goodness-of-ht statistic is based, and the alternative hypothesis Ha were 
specihed as Hq : 7 ^.^ = [ipi -f ip 2 m){k - m) versus Ha : 7 ^^^ % (V'l + i> 2 ^){k - m). Six 
scenarios regarding the true treatment effect model, Hq and a parametric specihcation of Ha 
were specihed as follows: 

Scenario (a): True: 7 ^,^ = (25 — 0 ■ 7m){k — m), Hq : 7 ^^^ = (-^1 -1- 'ilj 2 m){k — m), and 
Ha ■ 7m ,7 = (^1 + + ^3"i^)(fc - m)] 

Scenario (b): True: 7 ^_^ = (25-0-7m)(/c-m), Hq : 7 ^_^ = (V'i+t^2’^+'03injdrug)(/c-m), 
and Ha : 7m ,7 = (^1 + + ipzm^){k - m); 

Scenario (c): True: 7 ^_^ = (35 —l-lm-|-0-04m^)(/c —m), Hq : 7 ^,^ = ['ilji+'ilj 2 m){k — m), 
and Ha : 7 m ,7 = (V'l + '^ 2 ^ + '4)Qm7‘){k - m); 

Scenario (d): True: 7 ^ ,^ = (35 —l-lm-|-0-04fc^)(fc —m), Hq : 7 ^ ,^ = (- 01 -f-^ 2 ?^)(fc —?^), 
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and Ha : 7m,v- = (V'l + ^ 2 m + ip‘im^){k - m); 

Scenario (e): Trne: 7 ^,^ = (25 — m + 0 ■ 03m^)(fc — m), Hq : 7 ^,^ = (-^i + 'ip 2 fn){k — m), 
and Ha : 7 ^^^ = (V's + 'il^Am){k - 

Scenario (f): Trne: 7 ^_^ = (10 — 1 • lm){k — Hq : 7 ^,^ = (-01 + ip 2 m){k — m), and 

Ha : 7m,= (V'a + iJim){k - m)^/\ 

Specifically, in Scenarios (a) and (b), Hq is correctly specified. In Scenarios (c)-(f), Hq 
is misspecified with different degrees of departnre from the trne model. In Scenarios (c) and 
(d), Hq is nested in the parametric specification of Ha- In Scenarios (e) and (f), Hq is not 
nested in the parametric specification of Ha- 

Type-I error and power were estimated by the freqnency of rejecting Hq using 1,000 
simulated datasets. We considered the following choices of q: (i) = 1, which is a naive 

choice for comparison; (ii) = A^; and (iii) (ii) and (iii) were derived under 

the parametric specification of Ha- 

The optimal estimator was obtained by solving (5) with (7). In (5), the treatment 
initiation model was fitted by a logistic regression model adjusting for Y^, injection drug use, 
and month, restricted to patients and visits with Am-i = 0- Thus, the treatment initiation 
model was correctly specified. E{H^^{k) \ Am-i = 0,Lm} was fitted by a linear regression 
model adjusting for CDd^ and {k — m), restricted to patients and visits with A^-i = (). The 
covariates were motivated by (3) and the data generating mechanism. The nuisance models 
in are specified in the Supplementary Material for simplicity of presentation, which do 
not affect the double robustness of the estimator. 

In addition to the goodness-of-fit test statistic, we considered an elaborated-model-fitting- 
and-testing approach, which combines the null model and the parametric specification of 
the alternative model and tests the significance of the parameters corresponding to the 
alternative model. 

From Scenarios (a) and (b) in Table 1, where the treatment effect model under Hq is 
correctly specified, the goodness-of-fit test procedure with all choices of q controls type- 
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I error for n = 1,000 and n = 2,000. This suggests that the chi-squared distribution 
derived in this article provides an accurate approximation to the hnite sample behavior of 
the goodness-of-£t test statistic for these sample sizes. 

From Scenarios (c)-(f), where the treatment effect model is not correctly specihed, the 
goodness-of-£t test procedure with the optimal derived under the parametric specihcation 
of Ha is most powerful, and as the sample size increases, the power increases, conhrming 
the theoretical results. From Scenarios (c) and (d), the goodness-of-ht test procedure and 
the elaborated-model-htting-and-testing approach are comparable when testing nested mod¬ 
els. In both scenarios, the goodness-of-£t test procedure is slightly more powerful than 
the elaborated-model-htting-and-testing approach for n = 500 and n = 1, 000, which is 
not apparent for n = 2,000. For Scenarios (e) and (f), the null treatment effect model is 
not nested in the parametric specification of Ha- Under Scenario (e), the goodness-of-ht 
test statistic with shows more power than the elaborated-model-htting-and-testing ap¬ 
proach, likely because the elaborated-model-htting-and-testing approach hts a larger model 
and loses power. Under Scenario (f), the goodness-of-ht test statistic with g^^’^ is slightly 
more powerful than the elaborated-model-htting-and-testing approach for n = 500, and both 
approaches are powerful to reject the null model in other cases. 
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Table 1; Type-I error estimates and power estimates (xlOO) for testing the null model 
Hq by the proposed goodness-of-fit (GOF) test statistic with q being 1, A, and and 
the Elaborated Model Fitting and Testing (EMFT) approach over 1,000 simulations under 


Scenarios (a)-(f) 

Type-I error estimates in Scenario (a) 
GOF EMFT 


n\q 

1 

m 

~opt,/c 


500 

5-3 

4-3 

5-2 

4 • 9 

1000 

4-5 

5-7 

5-6 

5-4 

2000 

4-8 

4-4 

5-2 

5-3 


Power estimates 

in Scenario (i 



GOF 


EMFT 

n\q 

1 


~opt,k 

Hm 


500 

15 

29 

59 

56 

1000 

28 

55 

89 

84 

2000 

52 

88 

99 

99 


Power estimates 

in Scenario (i 



GOF 


EMFT 

n\q 

1 


~opt,/c 


500 

12 

25 

49 

28 

1000 

24 

53 

73 

54 

2000 

48 

79 

91 

80 


Type-I error estimates in Scenario (b) 


1 

GOF 

A fc 

~opt,fc 

EMFT 

9-1 

9-8 

12 ■ 3 

13-5 

5-3 

4-4 

5-4 

5-7 

5-2 

4-4 

5-2 

5-3 

Power estimates 

in Scenario (d) 

1 

GOF 

~opt,fc 

Hm 

EMFT 

90 

96 

97 

83 

100 

100 

100 

97 

100 

100 

100 

100 

Power estimates 

in Scenario (f) 

1 

GOF 

A^ 

~opt,fc 

Hm 

EMFT 

93 

99 

100 

96 

100 

100 

100 

100 

100 

100 

100 

100 


7 Application 

We applied the proposed goodness-of-ht test to study how the timing of combination antiret¬ 
roviral treatment initiation after infection predicts the effect of one year of treatment in HIV¬ 
positive patients. We used the Acute Infection Early Disease Research Program database. 
We started with a simple null model for the treatment effect, Hq : ^ = {^jJl+^jJ 2 m){k — m), 

and conducted directed alternative-model tests by testing whether possible effect modifi¬ 
ers should be added into the model. In the HIV literature, it has been found that there 
may be gender differences in immunologic response to combination antiretroviral treatment: 
early studies suggested that clinical disease progression was more rapid in women than men 
with combination antiretroviral treatment (Friedland et ah, 1991; Bozzette et ah, 1998); con¬ 
versely, more recent studies have shown that women have better immunologic outcomes 
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than men on treatment(Maman et ah, 2012; Maskew et ah, 2013). It has also been shown 
that older age is associated with a poorer CD4 connt increase with combination antiret¬ 
roviral treatment (Maman et ah, 2012; Maskew et ah, 2013). Injection drng use has been 
found to be associated with reduced effectiveness of combination antiretroviral treatment 
(Poundstone et ah, 2001). As suggested by the literature, we considered tests directed at 3 
variables: gender, age, and injection drug use. For the test directed at a certain variable Z, 
we calculated the goodness-of-fit test statistic with q being the optimal form derived from 
the parametric specification of the alternative model 7 ^ ,^ = ('^ 1 -|-'^ 2 ?^ + V’ 3 Z)(fc — m)l(fc>m)- 

The nuisance models were specified on the basis of the observed data, the clinical liter¬ 
ature, and subject knowledge. For the censoring model, we used a logistic regression model 
adjusting for square root of current CD4 count (CD4),{^), current log viral load, gender, age, 
injection drug use (injdrug), month, squared month, and whether a patient was treated, as 
discussed in Krishnan et ah (2011) and Lok et ah (2010). For the treatment initiation model, 
we used a logistic regression models including CD4),{^, current log viral load, gender, age, 
injection drug use, month, days since last visit, indication of first visit, indication of second 
visit, race, as discussed in Lok and Griner (2014). For E^^{H{k) \ Lm,Am = 0}, we used a 
regression model adjusting for CD4m, CD4^^(fc —m), CD4)){'‘age(/c —m), CD4^'^race(/c —m), 
CD4^'^injdrug(/c —m), whether there is a CD4 slope measure, CD4slope^(/c—( 6 —m)’*', 
and ( 6 ^ — with a’*' = a x l(a > 0). The model was motivated by (3). The inclusion 

of CD4^, CD4^/^(fc - m), and CD4^/^age(fc — m) was suggested from a stochastic model of 
for each patient i over time k, = q. _|_ 5 /i;_|_^^age-|-^ 2 age/c-|- 01 Fjfc-|-ejfc, where 

Qi is a normal random effect, Wik is a Brownian motion process, eik is normal with mean 
zero and constant variance, and Oj, Wit, eik and age are independent (Taylor et ah, 1994). 
Other covariates were suggested in Taylor and Law (1998) and May et ah (2009). 

Table 2 shows the results from fitting the optimal estimator of the null treatment effect 
model, along with the goodness-of-fit tests directed at gender, age, and injection drug use. 
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The p-values are all greater than 0-05. To avoid the multiple testing problem, we did not 
consider other tests. The three tests were specihed prior to the actual calculation. The results 
show a benefit of combination antiretroviral treatment; for example, starting treatment at 
the estimated date of infection would lead to an expected added improvement in CD4 counts 
of 12^1 = 299 cells/mm^ after a year of therapy. Delaying treatment initiation during acute 
and early infection may diminish the CD4 count gain associated with one year treatment 
{'<p 2 < 0); however, this result is not statistically signihcant. 


Table 2: The Acute Infection Early Disease Research Program data: the optimal estimator 
fitting the null treatment effect model: point estimate (95% conhdence intervals based on 
the asymptotic normality result), along with goodness-of-fit statistics (Statistic), associated 
degree of freedom (DF), and p-values (p-value) for the adequacy of the null model by testing 
whether gender or injection drug use should be added into the model 

V(i(95% Cl) V>2 (95% Cl) 


24-88(21-61, 28-15) 


-0-48(-l-47,0-52) 


Goodness-of-fit test 

Statistic DF 

Test directed at gender 0-99 1 

Test directed at age 0-80 1 

Test directed at injection drug use 2-93 1 


p-value 
0-32 
0-37 
0-09 


8 Discussion 

The applicability of the goodness-of-fit test procedure presented in this article is broad in 
the causal inference literature. The testing procedure can also be developed for the tradi¬ 
tional structural nested mean models (Robins, 1994) other than the time-dependent coarse 
structural nested mean models considered in this article, and marginal structural models 
(Robins, 2000), because both approaches yield overidentification of the parameters. For 
the Inverse-Probability-of-Censoring-Weighting estimator of marginal structural models, un¬ 
biased estimating equations are P„g(R){F — /i(A, V)}w{A | L) = 0, where Y is the outcome 
at the end of study, A is the treatment history, R is a subset of the baseline covariates, 
/i(a, V) = E{Y°' I V) is the marginal structural model, where F® is the counterfactual out- 
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come had every individual received the treatment a, and w{A \ L) is the inverse of conditional 
probability of receiving the actual treatment A given L. These equations are unbiased for 
most choices of q{V), leading to a large class of unbiased estimating equations. The literature 
of structural nested mean models and marginal structural models concentrates on estimation 
and efficiency. Little attention has been given to goodness-of-fit tests. Our test procedure 
for the treatment effect model can be developed in these contexts in the same manner. 

Our goodness-of-fit test procedure can also deal with treatment of the form “initiate 
treatment when the CD4 count first drops below x”. Especially in resource limited coun¬ 
tries, and historically also in the US, initiation of combination antiretroviral treatment is 
decided based on the CD4 count threshold. Orellana et ah (2010) proposed dynamic regime 
marginal structural models and Lok et ah (2007) used structural nested mean models to 
simultaneously compare dynamic treatment regimes of this form and estimate the optimal 
one. Due to the popularity of these methods, the development of goodness-of-fit tests in 
these settings will be useful for model diagnosis and protect causal estimates from biases 
introduced by misspecification of the treatment effect model. 
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